7-Methoxy-4-methylcoumarin: Standard Molar Enthalpy of Formation Prediction in the Gas Phase Using Machine Learning and Its Comparison to the Experimental Data

Experimentally, the standard molar enthalpy of formation in the crystalline phase at 298.15 K, ΔfHm°(cr) for 7-methoxy-4-methylcoumarin (7M4MC) was calculated by traditional linear regression, which was obtained by combustion calorimetry. Similarly, the standard molar enthalpy of sublimation was determined through the standard molar enthalpy of fusion and by the standard molar enthalpy of vaporization, from differential scanning calorimetry and thermogravimetry, respectively; lately using these results, the standard molar enthalpy of formation in the gas phase was calculated at 298.15 K, ΔfHm°(g). In addition ML was used to predict the standard molar enthalpy of formation in the gas phase for the 7M4MC, constructing an experimental data set containing three kinds of functional groups: esters, coumarins, and aromatic compounds. The procedure was performed by using multiple linear regression algorithms and stochastic gradient descent with a R2 of 0.99. The obtained models were used to compare those predicted values versus experimental for coumarins, resulting in an average error rate of 9.0%. Likewise, four homodesmic reactions were proposed and predicted with the multiple linear regression algorithm of ML obtaining good results.


■ INTRODUCTION
Coumarins are heterocyclic compounds containing a lactone group.These compounds represent a wide range of natural, pharmaceutical, and phytochemical products.The interest in natural coumarins has significantly increased over time, leading to their discovery in plant species with different chemical structures and phases (crystalline and gas).−4 Albeit, coumarins and their derivatives exhibit antimicrobial, 5,6 anti-inflammatory, 7,8 antispasmodic, antiviral, 9,10 antioxidant, 11 and enzyme inhibitor properties. 12,13−16 Despite the multiple potential applications described above, there are very few reports on their thermochemical properties.Among these properties, one of the most important is the standard molar enthalpy of formation, which provides a better explanation and support from the synthesis process. 17,18hermal and calorimetric techniques are commonly used to determine experimentally this property 19,20 by considering the differential scanning calorimetry (DSC), thermogravimetry, and combustion calorimetry, respectively.Furthermore, it is possible to predict this property with the use of computational techniques such as machine learning algorithms. 21,22This work presents the experimental results related to 7-methoxy-4methyl coumarin (7M4MC) compounds, as seen in Figure 1, which was predicted using multiple linear regression (MLR) and stochastic gradient descent regression (SGD) models in the gas phase 23,24 based on the Benson's group additivity method. 25omb was used to determine the combustion energies.This instrument was calibrated using benzoic acid (NIST Standard Reference Material 39j) with a combustion mass energy of −(26434.0 ± 3.0) J g −1 (the uncertainty corresponds to expanded uncertainty), which was corrected using Coops et al. equation. 27The calorimetric equivalent of ϵ (calor) = (1281.2± 0.8) J K −1 (the uncertainty is twice the standard deviation of the mean) was calculated from six combustion experiments at 3.04 MPa pressure under a high purity gaseous oxygen (Air Liquide Corp., mass fraction of 0.99999) with 0.1 cm 3 of deionized water. 28o maintain conditions similar to those of the reference material, the 7M4MC was oxidized considering the same parameters.The cotton-thread fuse (C 1.000 H 1.742 O 0.921 ) used possess a combustion specific energy of −(16945.2 ± 4.2) J g −1 (the uncertainty is the standard deviation of the mean).Albeit the combustion energies in standard conditions were determined through Washburn corrections. 27The compounds physical properties are resumed in Table 1, 29 where the elements' atomic weights were those reported by IUPAC in 2021. 30To calculate the energy change associated with the pressure, the estimated value of (δ u /δ p ) T = −0.2J g −1 MPa −1 at 298.15 K was used, which is a typical value considered for most of the solid organic compounds. 31hermogravimetry.The indirect method of thermal gravimetric analysis (TGA) was used to determine the vaporization enthalpy using the Langmuir equation.
where (dm/dt) is the rate of mass loss, A is the area which was subjected to the vaporization process, T is the temperature, p is the vapor pressure, R is the ideal gas constant, M is the molar mass of the compound, and γ is a vaporization constant.Combining Clausius−Clapeyron's to eq 1 yielded the expression that is applied to calculate the enthalpy of vaporization i k j j j j y where =

( ) ( ) ( )
and B includes the integration constant and the term R (2 ) 1/2 . Using eq 2, it was possible to obtain the vaporization enthalpies by applying a linear adjustment to ln v vs 1/T.A TA Instruments Q500 device, previously calibrated for mass and temperature, was used to register the term dm/dt with high precision.The thermogravimetric system was tested with phenanthrene and pyrene secondary standards (J.T. Baker).The standard molar enthalpy of vaporization results at 298.15 K were (77.9 ± 1.4) kJ mol −1 for phenanthrene and (86.4 ± 1.4) kJ mol −1 for pyrene (Tables S1−S3 in the Supporting Information).The calculated enthalpies are consistent with those reported in the literature. 32COMPUTATIONAL DETAILS MLR Model.The MLR model is a versatile statistical model for evaluating a continuous target and predictors correlation. 33he predictors can be continuous, categorical, or derived fields so that nonlinear relationships are also supported.The model is considered linear because it consists of additive terms, where each term is a predictor which is multiplied by an estimated coefficient (β i ) (see eq 3).
The constant term (intercept, β 0 ) is also usually added to the model. 34Multiple regression models can be used to predict the value of the dependent variable or to assess the influence that the predictors have on it (the latter should be analyzed with caution so as not to misinterpret the cause-effect). 35 + The model relates a dependent variable (y) with n regressor variables (X n ) and finally, a random variable (β 0 ) that collects all those factors that are not collectable and are associated with chance. 36t is important to bear in mind that the magnitude of each partial regression coefficient depends on the units in which the predictor variable is measured, so its magnitude is not associated with each predictor importance.To determine  each variable impact over the model, the standardized partial coefficients are used. 37tochastic Gradient Descent Regression.The SGD algorithm behaves like a straight-line formula, but it is based on a convex function. 38The starting point is just an arbitrary point, so the performance can be evaluated.From that consideration, the derivative (or slope) could be determined, the slope is associated with the parameter updates, i.e., weights and bias; at the starting point it will be steeper, but as new parameters are generated, the slope should gradually decrease until it reaches the curve's lowest point, known as the convergence point. 39The SGD runs a training epoch for each example within the data set and updates each parameter of the training example, one at a time. 40RESULTS AND DISCUSSION Experimental Results.Table 2 shows four experimental results from the compound 7M4MC, which are the data from the purity, the melting point, the enthalpy of phase change, and the heat capacity at constant pressure, including the experimental uncertainties.
The molar heat capacity was calculated from 273.15 to 388.15 K using the results obtained by DSC, these data are shown in Supporting Information in Table S4; for this calculus, the eq 4 is used, this was obtained from a polynomial regression applied to data in a graph of heat capacity vs temperature.
On the other hand, Ngoc Toan 41 determined that the melting temperature ranged from 432 to 435 K, which compared to our value presented a 0.37% error.It is important to mention that in the past decade in the literature, no value for 7M4MC enthalpy of fusion of the compound is reported.
Table 3 shows the 7M4MC combustion results.The six combustions' complete data sets are shown in the Supporting Information in Table S5.
The average combustion energy, enthalpy, and uncertainty at 298.15 K and 0.1 MPa are shown in Table 4.To calculate the standard molar enthalpy of formation Δ f H m °(cr) from the molar enthalpy of combustion Δ c H m °(cr) the CO 2 (g) and H 2 O(l) molar enthalpy of formation values were −(393.51± 0.13) and −(285.83± 0.04) kJ mol −1 at 298.15 K, respectively. 42able 5 shows the results of vaporization enthalpy for compound 7M4MC at T m = 463.15K (where T m is the mean temperature), four series of experiments were performed, and an average of the obtained values is reported (Figures S1−S4 in the Supporting Information).
Table 6 contains the enthalpy of sublimation calculation at 298.15 K, in addition this table presents the results for the enthalpy of fusion and vaporization under experimental conditions.The pertinent adjustment to 298.15 K was determined by applying eqs 5−7. 43,44The uncertainties correspond to the expanded uncertainty with a level of confidence of 95%, including uncertainty of calibration and u(T) = 0.1 K.The experiments were made under average atmospheric pressure (78.8 kPa), u(p) = 1 kPa.
The enthalpy of sublimation was calculated by adding the enthalpy of vaporization and the enthalpy of fusion at 298.15 K.The enthalpy of sublimation for this compound has not been reported elsewhere yet.
Meanwhile, the standard molar enthalpy of formation in the gas phase was obtained from the standard molar enthalpy of formation in the crystalline phase plus the enthalpy of sublimation; as seen in Table 7.
Theoretical Results.For assessing the precision from those values obtained experimentally, machine learning was used.To predict the 7M4MC enthalpy of formation in the gas phase, a data set was created based on the functional groups separation proposed by Benson; 23 for these analysis, the ester family compounds were considered because this is the main functional group presented in coumarins.
From a literature review, a data set of 84 experimental values was obtained, and the data was separated into training and testing using the hold out model (70/30) and the seed 204, respectively; so as a result, the values obtained in this work can be reproducible.The metrics results are shown in Table 8, likewise Figure 2 presents the comparison between the experimental and predicted value for all compounds as well as the linear regression is included as a perfect fit.
The evaluation metrics in the test set are low due to the fact that the data set does not consider the esters' aromatic interaction; thus, not enough experimental values were found, and as a result an addition of 24 aromatic compounds was permitted to compensate for the precision.The result after that The uncertainty corresponds to the combined standard and includes the uncertainties of the slope, the rate of mass loss, and the temperature.The uncertainty corresponds to the expanded uncertainty with a level of confidence of 95%.consideration was favorable, as is shown in Table 9, where the standard molar enthalpies of formation in the gas phase exclusively for the coumarins are reported; furthermore, a Δ parameter was added, which represents the error between the experimental and predicted values.
As observed in Table 9, the MLR and SGD values are quite close to the experimental value, so it means that the proposed models can be applied to predict the compound of interest's enthalpies.
By analyzing the coumarin behavior and the two regression approach, we have observed that those theoretical tools are useful to predict the enthalpy of a desired compound because the difference between the two methods has a similar variation with respect to the experimental value.
On the other hand, using the Benson's method results in a greater error due to the lack of updating of the data and by not considering the difference in between isomers, it falls into the same result for different kinds of molecules.Another theoretical method to obtain the enthalpy of formation in the gas phase is through the use of homodesmic reactions and, thus, is necessary to propose the 7M4MC reactions, as shown in Figure 3.
The molecules proposed in reactions I, II, III, and IV were predicted by MLR.Besides for the ester compounds, a data set was constructed using 78 gas phase enthalpy of formation values in total; however, for the aromatic compounds, 53 values were used; in both cases, a hold-out (70/30) was maintained.
The seeds used were 39 and 508 for each type of compound, respectively (evaluation metrics are shown in Table S6 in the Supporting Information and Figures S5 and S6).
Table 10 shows the prediction results from those molecules presented in Figure 3.However, in Table 11, the homodesmic reaction results are presented, and the difference between the 7M4MC predicted value against the experimental value for each homodesmic reaction is given in brackets.
As observed from those results reported in Table 11, it shows that although all the proposed reactions values are close to the experimental value, the best reaction is III where the 1methoxy-4-methylbenzene compound is presented.From this analysis, it can be suggested that the use of MLR to predict the enthalpy of formation of organic compounds is fast and reliable to the conventional software already used. 50lthough one of the purposes of SGD is to improve the coefficients presented for MLR, we observed that similar results are obtained with both MLR and SGD, so the application of MLR is also a trustable option to be applied in these thermochemical property prediction.
To estimate the enthalpy of formation in the crystalline phase, a conventional regression was performed based on the coumarins reported experimentally in this phase because for esters and aromatics the reported condensed phase is the liquid phase.The regressors considered were the amount of C, H, and O atoms together and X 3 and X 4 as variables, which indicates where each of the coumarin radicals used binds, as seen in eq 8.The coefficient of determination (R 2 ) was 0.9951.The resultant predictions are listed in Table 12.
where X 1 represents the number of H atoms, X 2 is the number of O atoms, X 3 is the radical 1, and X 4 is the radical 2. To identify the radical position, the numbering must begin from the carbonyl group toward the methoxy group, as shown in Figure 1.As can be seen from eq 8, the carbon atoms amount within the compounds does not affect the enthalpy of formation in the crystalline phase estimation, the resultant value for compound 7M4MC was −(492.0 ± 4.1) kJ mol −1 (the uncertainty represents the average absolute error of the coumarins presented in Table 12), this value has a difference of 3.5 kJ mol −1 with respect to the experimentally obtained.
The predicted values of coumarins using this regression are quite close to those reported in the literature, so it is a good option to perform this type of analysis when few data are available, and it is necessary to compare with an experimental value.
Finally, an additional advantage for MLR is that it is possible to obtain regression coefficients; these coefficients represent a change and update to the conventional ones shown by Benson, as it is shown in Table 13.

■ CONCLUSIONS
The enthalpy of fusion was determined by DSC and the enthalpy of vaporization was obtained by thermogravimetric analysis.The experimental standard molar enthalpy of formation in the gas phase of 7M4MC as a result from the standard molar enthalpy of formation in the solid phase and the standard molar enthalpy of sublimation resulting in −383.6 kJ mol −1 , this value represents an excellent agreement concerning the value predicted from machine learning algorithms, which have a difference of 1.3 kJ mol −1 with respect to MLR and 1.4 kJ mol −1 with respect to the SGD regression.Based on that obtained from the experimental section, the enthalpy of formation in the crystalline phase was predicted using a fitting equation that was able to distinguish between the structural isomerism on different compounds and although it was only applied to a small data set, it was possible    to demonstrate a prediction path when limited experimental values are available.Finally by using the fitting equation, a difference of 3.5 kJ mol −1 was obtained, and with the homodesmic reactions it was possible to propose an alternative method capable of predicting the enthalpy of formation in the gas phase; thus, the optimal reaction had a difference with an experimental value of 1.2 kJ mol −1 and the biggest difference of 2.6 kJ mol −1 with respect to the experimental value.

■ ASSOCIATED CONTENT
* sı Supporting Information

c
Estimated value in the reference at T = 298.15K.31 d  Experimental average value from two experiments using a DSC device.Its uncertainty corresponds to expanded uncertainty with a level of confidence of approximately 95%.Including the contributions from the calibration and u(T) = 0.1 K.The experiments were realized under average atmospheric pressure (78.8 kPa), u(p) = 1 kPa.

Figure 2 .
Figure 2. Experimental and predicted value comparison from −Δ f H m °(g, 298.15 K) by MLR.

Figure 3 .
Figure 3. Homodesmic reactions used in the 7M4MC enthalpy of formation determination.

Table 1 .
Physical Properties at p o = 0.1 MPa Based on the 2021 IUPAC recommendation. 30b Calculated using Advanced Chemistry Development (ACD/Laboratories) Software v11.02.

Table 2 .
Melting Temperature of 7M4MC a

Table 3 .
Combustion Experiments for 7M4MC at 298.15 K and p o = 0.1 MPa

Table 1 ,
ΔT c is the corrected temperature rise, ϵ (cont) is the energy equivalent of the contents of the bomb, ΔU ign is the ignition energy, and ΔU IBP is the energy of the isothermal bomb process, which was calculated by ΔU IBP = [ϵ (calor)(−ΔT c ) + ϵ (cont)(−ΔT c ) + (ΔU ign ) (ΔU corr )].ΔU corr is the correction to standard state and Δ c u o (7M4MC) is the 7-methoxy-4-methylcoumarin mass energy of combustion.The uncertainty corresponds to the expanded uncertainty with a confidence level of 95%.

Table 4 .
Standard Molar Energy and Enthalpy of Combustion and Standard Molar Enthalpy of Formation in the Solid Phase at 298.15 K

Table 5 .
Vaporization Enthalpies for 7M4MC a

Table 6 .
Determination of Enthalpy of Sublimation at 298.15 K a m (298.15K)b (kJ mol −1 ) Δ l g H m (298.15K)c (kJ mol −1 ) Δ cr l H m (298.15K)+ Δ l g H m (298.15K)(kJ mol −1 )a All the uncertainties correspond to twice the combined standard.b Value calculated from eqs 5 and 6. c Value calculated from eq 7.

Table 7 .
Standard Molar Enthalpies of Formation and Sublimation of 7M4MC at 298.15 K a

Table 8 .
Algorithm Evaluation Metrics a Mean absolute error.b Root mean squared error.

Table 9 .
Comparison between Literature and Predicted Values of −Δ f H m °(g, 298.15 K) in kJ mol −1 and Benson a Values predicted using MLR.b Values predicted using the SGD regression.c Values calculated using Benson.d Taken of ref 45. e Taken of ref 46.f Taken of ref 47. g Taken of ref 48.h Taken of ref 49.i Experimental value of this work.

Table 11 .
Computational Estimates of the Standard Enthalpy of Formation in the Gas Phase at 298.15 K of the 7M4MC

Table 12 .
Comparison between Experimental and Predicted Values of −Δ f H°(cr, 298.15 K) Using the Regression in kJ mol −1 b Taken of ref 23.c Taken of ref 53.d Taken of ref 48. e Taken of ref 47. f Taken of ref 46.g Taken of ref 49.h Experimental value of this work.

Table 13 .
Update of Benson Functional Groups of −Δ f H m °(g, 298.15 K) for Esters and Coumarins in kJ mol −1 aRepresents the β 0 value.